A New Approach for Dimensionality Reduction : Theory
نویسنده
چکیده
This paper applies Whitney's embedding theorem to the data reduction problem and introduces a new approach motivated in part by the (constructive) proof of the theorem. The notion of a good projection is introduced which involves picking projections of the high-dimensional system that are optimized such that they are easy to invert. The basic theory of the approach is outlined and algorithms for nding the projections are presented and applied to several test cases. A method for constructing the inverse projection is detailed and its properties, including a new measure of complexity, are discussed. Finally, well-known methods of data reduction are compared with our approach within the context of Whitney's theorem. 1 The Data Reduction Problem Assume that the data set A of interest is a discrete sampling of a compact m-dimensional submanifold U embedded initially in an ambient vector space R q of high dimension (minimally q > 2m + 1 but it is generally much larger). Given the q-dimensional ambient space is redundant and an over-parametrization of the data the aim of reduction is to determine a new parametrization which more closely reeects the intrinsic dimension of the data. Our basic approach to this problem is to nd a smooth embedding of the data set B V such that the mapping G : A ! B is a diieomorphism. Furthermore, V is, by extrapolation, a re-embedding to U and resides in a space of lower dimension, i.e., V R p. Thus we seek a mapping G : U ! V which will retain the diierential structure of data set in its reduced space. In practice the mapping G will be determined by the data and speciically will be chosen s.t. the inverse mapping H is especially well-conditioned. The resulting composition of mappings H G should closely approximate the identity mapping. The ability to construct an inverse H is central to our approach in that it assures us that no data has been lost in the 1 procedure and that our re-embedded manifold V possesses all of the information contained in V. The approach to the reduction problem presented in this paper is motivated largely by the (easy) Whitney embedding theorem 8]. This theorem demonstrates that generically 1 a projection of any m-dimensional manifold is invertible provided the dimension of the range of the projection is no smaller that 2m + 1. Given this exibility, we propose to examine …
منابع مشابه
A New Approach for Knowledge Based Systems Reduction using Rough Sets Theory (RESEARCH NOTE)
Problem of knowledge analysis for decision support system is the most difficult task of information systems. This paper presents a new approach based on notions of mathematical theory of Rough Sets to solve this problem. Using these concepts a systematic approach has been developed to reduce the size of decision database and extract reduced rules set from vague and uncertain data. The method ha...
متن کاملA Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters
Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...
متن کاملIntegration and Reduction of Microarray Gene Expressions Using an Information Theory Approach
The DNA microarray is an important technique that allows researchers to analyze many gene expression data in parallel. Although the data can be more significant if they come out of separate experiments, one of the most challenging phases in the microarray context is the integration of separate expression level datasets that have gathered through different techniques. In this paper, we prese...
متن کاملDiagnosis of Diabetes Using an Intelligent Approach Based on Bi-Level Dimensionality Reduction and Classification Algorithms
Objective: Diabetes is one of the most common metabolic diseases. Earlier diagnosis of diabetes and treatment of hyperglycemia and related metabolic abnormalities is of vital importance. Diagnosis of diabetes via proper interpretation of the diabetes data is an important classification problem. Classification systems help the clinicians to predict the risk factors that cause the diabetes or pre...
متن کاملمدل ترکیبی تحلیل مؤلفه اصلی احتمالاتی بانظارت در چارچوب کاهش بعد بدون اتلاف برای شناسایی چهره
In this paper, we first proposed the supervised version of probabilistic principal component analysis mixture model. Then, we consider a learning predictive model with projection penalties, as an approach for dimensionality reduction without loss of information for face recognition. In the proposed method, first a local linear underlying manifold of data samples is obtained using the supervised...
متن کامل2D Dimensionality Reduction Methods without Loss
In this paper, several two-dimensional extensions of principal component analysis (PCA) and linear discriminant analysis (LDA) techniques has been applied in a lossless dimensionality reduction framework, for face recognition application. In this framework, the benefits of dimensionality reduction were used to improve the performance of its predictive model, which was a support vector machine (...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998